Darwini: Generating realistic large-scale social graphs
نویسندگان
چکیده
Synthetic graph generators facilitate research in graph algorithms and processing systems by providing access to data, for instance, graphs resembling social networks, while circumventing privacy and security concerns. Nevertheless, their practical value lies in their ability to capture important metrics of real graphs, such as degree distribution and clustering properties. Graph generators must also be able to produce such graphs at the scale of real-world industry graphs, that is, hundreds of billions or trillions of edges. In this paper, we propose Darwini, a graph generator that captures a number of core characteristics of real graphs. Importantly, given a source graph, it can reproduce the degree distribution and, unlike existing approaches, the local clustering coefficient and joint-degree distributions. Furthermore, Darwini maintains metrics such node PageRank, eigenvalues and the K-core decomposition of a source graph. Comparing Darwini with state-of-the-art generative models, we show that it can reproduce these characteristics more accurately. Finally, we provide an open source implementation of our approach on the vertex-centric Apache Giraph model that allows us to create synthetic graphs with one trillion edges.
منابع مشابه
Tree decompositions and social graphs
Recent work has established that large informatics graphs such as social and information networks have a nontrivial treelike structure when viewed at moderate-size scales. Here, we present results from the first detailed empirical evaluation of the use of tree decomposition (TD) heuristics for structure identification and extraction in social graphs. Although TDs have historically been used in ...
متن کاملParallel Generation of Massive Scale-Free Graphs
One of the biggest huddles faced by researchers studying algorithms for massive graphs is the lack of large input graphs that are essential for the development and test of the graph algorithms. This paper proposes two efficient and highly scalable parallel graph generation algorithms that can produce massive realistic graphs to address this issue. The algorithms, designed to achieve high degree...
متن کاملScaling Collaborative Filtering to Large-Scale Bipartite Rating Graphs Using Lenskit and Spark
Popular social networking applications such as Facebook, Twitter, Friendster, etc. generate very large graphs with different characteristics. These social networks are huge, comprising millions of nodes and edges that push existing graph mining algorithms and architectures to their limits. In product-rating graphs, users connect with each other and rate items in tandem. In such bipartite graphs...
متن کاملTitle Scale Free Interval Graphs
Scale free graphs have attracted attention by their non-uniform structure that can be used as a model for various social and physical networks. In this paper, we propose a natural and simple random model for generating scale free interval graphs. The model generates a set of intervals randomly under a certain distribution, which defines a random interval graph. The main advantage of the model i...
متن کاملSynthetic Graph Generation from Finely-Tuned Temporal Constraints
Large-scale graphs are at the core of a plethora of modern applications such as social networks, transportation networks, or the Semantic Web. Such graphs are naturally evolving over time, which makes particularly challenging graph processing tasks e.g., graph mining. To be able to realize rigorous empirical evaluations of research ideas, the graph processing community needs finely-tuned genera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.00664 شماره
صفحات -
تاریخ انتشار 2016